NoteBook

2024

Distribution Shifts

Supervised Pretraining Can Learn In-Context Reinforcement Learning

IN-CONTEXT EXPLORATION-EXPLOITATION FOR REINFORCEMENT LEARNING

TABPFN A TRANSFORMER THAT SOLVES SMALL TABULAR CLASSIFICATION PROBLEMS IN A SECOND

TRANSFORMERS CAN DO BAYESIAN INFERENCE

ALECE An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads (Extended)

PostgreSQL 14 Internals

Efficient Memory Management for Large Language Model Serving with PagedAttention

Decision Transformer Reinforcement Learning via Sequence Modeling

Summarize of RL paper I have learned

CS234 reinforcement learning

An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning

Lero A Learning-to-Rank Query Optimizer

Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes

Lero A Learning-to-Rank Query Optimizer

QTune A Query-Aware Database Tuning System with Deep Reinforcement Learning

Query Performance Prediction for Concurrent Queries using Graph Embedding

Cost-based or Learning-based A Hybrid Query Optimizer for Query Plan Selection

Multimodal-GPT A vision language model for dialogue with humans

MultiModel dialogue response generation

PilotScope Steering Databases with Machine Learning Drivers

A Learned Query Rewrite System using Monte Carlo Tree Search

Looking Ahead Makes Query Plans Robust

Simple Adaptive Query Processing vs Learned Query Optimizers

TAWARE A SIMPLE NEURAL ATTENTIVE META-LEARNER

TAWARE Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems

TRUSTWORTHY LLMS A SURVEY AND GUIDELINE FOR EVALUATING LARGE LANGUAGE MODELS ALIGNMENT

Denoising Diffusion Probabilistic Model

LFM data collection

Neo A Learned Query Optimizer

Steering Query Optimizers A Practical Take on Big Data Workloads

Deploying a Steered Query Optimizer in Production at Microsoft

QO-Insight Inspecting Steered Query Optimizers

AutoSteer Learned Qvery Optimization for Any SQL Database

A Learned Query Rewrite System using Monte Carlo Tree Search

Leveraging Query Logs and Machine Learning for Parametric Query Optimization

Kepler Robust Learning for Faster Parametric Query Optimization

2023

Robust Query-Driven Cardinality Estimation under Changing Workloads

Learned Cardinality Estimation for Similarity Queries

Large Language Models

Large Language Models

Attention Is All You Need

Gemini (OpenAI)

Revisiting Deep Learning Models for Tabular Data

Revisiting Deep Learning Models for Tabular Data

YESQL

Balanced Byzantine Reliable Broadcast with Near-Optimal Communication and Improved Computation

synchronous Data Dissemination and its Applications

Multidimensional agreement in Byzantine systems

MST algorithm

Atomic Cross-Chain Swaps

An Improved Distributed Algorithm for Maximal Independent Set Moh

Maximal Independent Set

HotStuff

Asymptotically Optimal Validated Asynchronous Byzantine Agreement

On the Expressive Power of Deep Neural Networks

YeSQL You extend SQL with Rich and Highly Performant User-Defined Functions in Relational Databases

Distributed Deep Learning on Data Systems A Comparative Analysis of Approaches

PostgresqlML

PostgresqlML

ModelKeeper Accelerating DNN Training via Automated Training Warmup

Taurus Lightweight Parallel Logging for In-Memory Database Management Systems

Overview of SciDB

Pump Up the Volume Processing Large Data on GPUs with Fast Interconnects

Spangle A Distributed In-Memory Processing System for Large-Scale Arrays

C-Store A Column-oriented DBMS

ColumnML Column-Store Machine Learning with On-The-Fly Data Transformation

H-Store A High-Performance, Distributed Main Memory Transaction Processing System

Pruning neural networks without any data by iteratively conserving synaptic flow

Dynamo Amazon Highly Available Key-value Store

Attention Is All You Need

Orca A Distributed Serving System for Transformer-Based Generative Model

Best of Both Worlds Model Selection

Best of Both Worlds Model Selection

Amazon DynamoDB A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service

ANYTIME NEURAL NETWORK A VERSATILE TRADEOFF BETWEEN COMPUTATION AND ACCURACY

Citus Distributed PostgreSQL for Data-Intensive Applications

State Management in Apache Flink

Trisk Task-Centric Data Stream Reconfiguration

Sharing Buffer Pool Memory in Multi-Tenant Relational Database-as-a-Service

Whats Really New with NewSQL

S-store Streaming Meets Transaction Processing

TabNAS Rejection Sampling for Neural Architecture Search on Tabular Datasets

The MADlib Analytics Library or MAD Skills, the SQL

Towards a Unified Architecture for in-RDBMS Analytics

Hybrid In Database Inference for Declarative Information Extraction

Vertica ML Distributed Machine Learning in Vertica Database

End-to-end Optimization of MachMachine Learning Prediction Queries

Building An Elastic Query Engine on Disaggregated Storage

Query Processing on Tensor Computation Runtimes

MLbase A Distributed Machine-learning System

MLlib Machine Learning in Apache Spark

XuanYuan An AI-Native Database

Database Meets survery. AI Meets Database AI4DB and DB4AI

Machine Learning for Databases

PostCENN PostgreSQL with Machine Learning Models for Cardinality Estimation

External vs. Internal An Essay on Machine Learning Agents for Autonomous Database Management Systems

High Performance and Accurate Training Data Collection for Self-Driving Database Management Systems

Data Parallel Actors, A Programming Model for Scalable Query Serving Systems

2022

Graph Masked Autoencoder Enhanced Predictor for Neural Architecture SearchInference Serving

FlexChain An Elastic Disaggregated Blockchain

MArk Exploiting Cloud Services for Cost-Effective and SLO-Aware Machine Learning Inference Serving

Serving Deep Learning Models with Deduplication from Relational Databases

TETRIS Memory efficient Serverless Inference through Tensor Sharing

Algorand-blog2

Hybrid Transactional Analytical Processing A Survey

Algorand Scaling Byzantine Agreements for Cryptocurrencies

PolarDB Serverless A cloud Native Database for Disaggregated Data Centers

Byzantine Agreement Made Trivial

F1 Lightning HTAP as a Service

LedgerDB A Centralized Ledger Database for Universal Audit and Verification

GlassDB An Efficient Verifiable Ledger Database System Through Transparency

ForkBase An Efficient Storage Engine for Blockchain and Forkable Applications

Neural Architecture Search as Program Transformation Exploration

Summary of the model parallelism-based training and optimization

Memory Efficient Pipeline-Parallel DNN Training

PipeDream Generalized Pipeline Parallelism for DNN Training.

Bamboo Making Preemptible Instances Resilient for Affordable Training of Large DNNs

Dorylus Affordable, Scalable and Accurate GNN Training with Distributed CPU Servers and Serverless Threads

A Stochastic Optimization Strategy for Parallel Sparse FastTucker Decomposition on GPU Platform

Intel SGX post

Azure SQL Database Always Encrypted

Operon An Encrypted Database for Ownership-Preserving

TiDB A Raftbased HTAP Database

ByteHTAP ByteDance HTAP System with High Data Freshness and Strong Data Consistency

Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing

Streamlet Textbook Streamlined Blockchains

Practical Byzantine Fault Tolerance

Neural Architecture Search A Survey

How Powerful are Performance Predictors in Neural Architecture Search

NASPipe High Performance and Reproducible Pipeline Parallel Supernet Training via Causal Synchronous Parallelism

AttentiveNAS Improving Neural Architecture Search via Attentive Sampling

NEURAL ARCHITECTURE SEARCH ON IMAGENET IN FOUR GPU HOURs A THEORETICALLY INSPIRED PERSPECTIVE

KNAS Green Neural Architecture Search

ZERO-COST PROXIES FOR LIGHTWEIGHT NAS

HYDROZOA DYNAMIC HYBRID-PARALLEL DNN TRAINING ON SERVERLESS CONTAINERS

GPipe Easy Scaling with Micro-Batch Pipeline Parallelism

Cerebro A Data System for Optimized Deep Learning Model Selection

Dremel made simple with Parquet

Dremel Interactive Analysis of WebScale Datasets

Lakehouse A New Generation of Open Platforms that Unify DataWarehousing and Advanced Analytics

HNAS

SeBS A Serverless Benchmark Suite for Function-as-a-Service Computing

Knative Serving overview

Faastlane Accelerating Function-as-a-Service Workflows

Serverless Computing One Step Forward Two Steps Back

SAND Towards High-Performance Serverless Computing

Cohort Query Processing

Towards Demystifying Serverless Machine Learning Training

Neural Architecture Search without Training

NAS-BENCH-201 EXTENDING THE SCOPE OF REPRODUCIBLE NEURAL ARCHITECTURE SEARCH

The Google File System

Chubby

NAS-Bench-101 Towards Reproducible Neural Architecture Search

Large-scale cluster management at Google with Borg

BOHB Robust and Efficient Hyperparameter Optimization at Scale

HYPERBAND BANDIT-BASED CONFIGURATION EVALUATION FOR HYPERPARAMETER OPTIMIZATION

Practical Bayesian Optimization of Machine Learning Algorithms

Auto-Pytorch, Multi-Fidelity MetaLearning for Efficient and Robust AutoDL

Google Vizier A Service for Black-Box Optimization

Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale

Graph neural networks A review of methods and applications

A Gentle Introduction to Graph Neural Networks

PaSca a Graph Neural Architecture Search System under the Scalable Paradigm

MODULARNAS TOWARDS MODULARIZED AND REUSABLE NEURAL ARCHITECTURE SEARCH

NAS-BENCH-SUITE NAS EVALUATION IS (NOW) SURPRISINGLY EASY

ZooKeeper Wait-free coordination for Internet-scale systems

ServeDB Secure, Verifiable, and Efficient Range Queries on Outsourced Database

Efficient Neural Architecture Search via Parameter Sharing

Progressive Neural Architecture Search

Learning Transferable Architectures for Scalable Image

DARTS DIFFERENTIABLE ARCHITECTURE SEARCH

Retiarii A Deep Learning Exploratory-Training Framework

Paxos Made Moderately Complex

INFaaS Automated Model-less Inference Serving

IntegriDB Verifiable SQL for Outsourced Databases

Paxos Made Simple

Cool a COhort OnLine analytical processing system

Spitz A Verifiable Database System

Chain Replication for Supporting High Throughput and Availability

The design of a practical system for fault-tolerant virtual machines

Neural Architecture Search A Survey

17

The Role of Distributed State

Distributed Snapshots Determining Global States of Distributed Systems

OrdTime, Clocks, and the Ordering of Events in a Distributed System

PipeSwitch Fast Pipelined Context Switching for Deep Learning Applications

Microsecond Consensus for Microsecond Applications

2021

Building Scalable and Flexible Cluster Managers Using Declarative Programming

NASI Label and Data-agnostic Neural Architecture Search at Initialization

Enabling SQL-based Training Data Debugging for Federated Learning

Privacy Preserving Vertical Federated Learning for Tree-based Models

Programmable Calendar Queues for High-speed Packet Scheduling

A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency

Load balancing and proxying

RocksDB

Spanner Google s Globally distributed database

distributed-system-model

A TRANSACTIONAL PERSPECTIVE ON EXECUTE-ORDER-VALIDATE BLOCKCHAINS

It's the niceties that make the difference fate gives us the hand, and we play the cards.